The categories in the BioLog dataset are as follows:
| Column ID | Description |
|---|---|
| Sample.ID | The location the sample was taken from. There are 2 water samples and 2 soil samples. |
| Rep | The experimental replicate. 3 replicates for each combination of experimental variables. |
| Well | The well number on the BioLog plate. |
| Dilution | The dilution factor of the sample. |
| Substrate | The name of the carbon source in that well. “Water” is the negative control. |
| Hr_24 | The light absorbance value after 24 hours of incubation. |
| Hr_48 | The light absorbance value after 48 hours of incubation. |
| Hr_144 | The light absorbance value after 144 hours of incubation. |
Here is the start of the BioLog dataset as an example:
## Sample.ID Rep Well Dilution Substrate Hr_24 Hr_48
## 1 Clear_Creek 1 A1 0.001 Water 0.000 0.000
## 2 Clear_Creek 1 A2 0.001 β-Methyl-D- Glucoside 0.004 0.005
## 3 Clear_Creek 1 A3 0.001 D-Galactonic Acid γ-Lactone 0.008 0.007
## 4 Clear_Creek 1 A4 0.001 L-Arginine 0.003 0.002
## 5 Clear_Creek 1 B1 0.001 Pyruvic Acid Methyl Ester 0.002 0.000
## 6 Clear_Creek 1 B2 0.001 D-Xylose 0.011 0.008
## Hr_144
## 1 0.000
## 2 0.004
## 3 0.001
## 4 0.000
## 5 0.007
## 6 0.021
The first question we want to answer is whether the samples are fundamentally different from each other.
One way to do this is by making a plot:
Overall, they don’t look very different. Let’s try a facet wrap for each substrate.
In the case of some substrates, the water and soil samples look very different from each other. We can also look at a t test of each sample compared to the others. Let’s start by subsetting the data into datasets for each sample ID.
Clear_creek <- BioLog[BioLog$Sample.ID == "Clear_Creek", ]
Soil1 <- BioLog[BioLog$Sample.ID == "Soil_1",]
Soil2 <- BioLog[BioLog$Sample.ID == "Soil_2",]
Waste_water <- BioLog[BioLog$Sample.ID == "Waste_Water",]
Next we will perform a t test on the soil samples and on the water samples, separately.
t.test(Clear_creek$Hr_144, Waste_water$Hr_144)
##
## Welch Two Sample t-test
##
## data: Clear_creek$Hr_144 and Waste_water$Hr_144
## t = -4.5818, df = 555.18, p-value = 5.696e-06
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -0.3675931 -0.1469902
## sample estimates:
## mean of x mean of y
## 0.3511493 0.6084410
t.test(Soil1$Hr_144, Soil2$Hr_144)
##
## Welch Two Sample t-test
##
## data: Soil1$Hr_144 and Soil2$Hr_144
## t = 0.88529, df = 573.97, p-value = 0.3764
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -0.07108979 0.18776340
## sample estimates:
## mean of x mean of y
## 1.399306 1.340969
This shows that the water samples are different from each other, while the soil samples are not.
Next, let’s compare the water samples to a soil sample and see how they compare.
t.test(Clear_creek$Hr_144, Soil1$Hr_144)
##
## Welch Two Sample t-test
##
## data: Clear_creek$Hr_144 and Soil1$Hr_144
## t = -17.867, df = 539.62, p-value < 2.2e-16
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -1.1633916 -0.9329209
## sample estimates:
## mean of x mean of y
## 0.3511493 1.3993056
t.test(Waste_water$Hr_144, Soil1$Hr_144)
##
## Welch Two Sample t-test
##
## data: Waste_water$Hr_144 and Soil1$Hr_144
## t = -12.47, df = 571.07, p-value < 2.2e-16
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -0.9154274 -0.6663018
## sample estimates:
## mean of x mean of y
## 0.608441 1.399306
This shows that both water samples are fundamentally different from the first soil sample. Let’s try it with the other soil sample:
t.test(Clear_creek$Hr_144, Soil2$Hr_144)
##
## Welch Two Sample t-test
##
## data: Clear_creek$Hr_144 and Soil2$Hr_144
## t = -16.794, df = 537.82, p-value < 2.2e-16
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -1.1055958 -0.8740431
## sample estimates:
## mean of x mean of y
## 0.3511493 1.3409688
t.test(Waste_water$Hr_144, Soil2$Hr_144)
##
## Welch Two Sample t-test
##
## data: Waste_water$Hr_144 and Soil2$Hr_144
## t = -11.504, df = 570.44, p-value < 2.2e-16
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -0.8575905 -0.6074650
## sample estimates:
## mean of x mean of y
## 0.608441 1.340969
This yields similar results to the first soil and water t test set.
Next Question: Are the soil samples significantly different from the water samples?
A tukey test is likely the best way to compare these.
For the first part, we will compare the samples to each other overall:
summary(mod1)
## Df Sum Sq Mean Sq F value Pr(>F)
## Sample.ID 3 238.3 79.44 147.2 <2e-16 ***
## Residuals 1148 619.6 0.54
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
TukeyHSD(mod1)
## Tukey multiple comparisons of means
## 95% family-wise confidence level
##
## Fit: aov(formula = Hr_144 ~ Sample.ID, data = BioLog)
##
## $Sample.ID
## diff lwr upr p adj
## Soil_1-Clear_Creek 1.04815625 0.89065127 1.20566123 0.0000000
## Soil_2-Clear_Creek 0.98981944 0.83231447 1.14732442 0.0000000
## Waste_Water-Clear_Creek 0.25729167 0.09978669 0.41479664 0.0001665
## Soil_2-Soil_1 -0.05833681 -0.21584178 0.09916817 0.7761474
## Waste_Water-Soil_1 -0.79086458 -0.94836956 -0.63335961 0.0000000
## Waste_Water-Soil_2 -0.73252778 -0.89003276 -0.57502280 0.0000000
The summary(mod1) command tells us there is a difference between the samples, and the TukeyHSD(mod1) command lets us know which samples are significantly different from each other. In this case, the soil samples are the only ones that are not significantly different from each other; when comparing two sample types against each other, all others have significant results
This can also be illustrated with a graph:
plot(TukeyHSD(mod1))
The entries with a part around 0.0 are not functionally different. There is only one entry that is centered around 0, which matches the results from the Tukey test.
Below is the data for each individual substrate.
2-Hydroxy Benzoic Acid:
TukeyHSD(aov(X$`2-Hydroxy Benzoic Acid`$Hr_144~X$`2-Hydroxy Benzoic Acid`$Sample.ID))
## Tukey multiple comparisons of means
## 95% family-wise confidence level
##
## Fit: aov(formula = X$`2-Hydroxy Benzoic Acid`$Hr_144 ~ X$`2-Hydroxy Benzoic Acid`$Sample.ID)
##
## $`X$`2-Hydroxy Benzoic Acid`$Sample.ID`
## diff lwr upr p adj
## Soil_1-Clear_Creek 1.41855556 0.7626908 2.0744203 0.0000094
## Soil_2-Clear_Creek 1.17211111 0.5162463 1.8279759 0.0001770
## Waste_Water-Clear_Creek -0.01755556 -0.6734203 0.6383092 0.9998601
## Soil_2-Soil_1 -0.24644444 -0.9023092 0.4094203 0.7401743
## Waste_Water-Soil_1 -1.43611111 -2.0919759 -0.7802463 0.0000076
## Waste_Water-Soil_2 -1.18966667 -1.8455315 -0.5338019 0.0001438
This shows that the soil and water samples are fundamentally different from each other, but the two samples from each type, when compared, have similar results.
4-Hydroxy Benzoic Acid:
## Tukey multiple comparisons of means
## 95% family-wise confidence level
##
## Fit: aov(formula = X$`4-Hydroxy Benzoic Acid`$Hr_144 ~ X$`4-Hydroxy Benzoic Acid`$Sample.ID)
##
## $`X$`4-Hydroxy Benzoic Acid`$Sample.ID`
## diff lwr upr p adj
## Soil_1-Clear_Creek 1.22477778 0.3710130 2.0785425 0.0025816
## Soil_2-Clear_Creek 1.21022222 0.3564575 2.0639870 0.0029262
## Waste_Water-Clear_Creek 0.05744444 -0.7963203 0.9112092 0.9978037
## Soil_2-Soil_1 -0.01455556 -0.8683203 0.8392092 0.9999638
## Waste_Water-Soil_1 -1.16733333 -2.0210981 -0.3135686 0.0042194
## Waste_Water-Soil_2 -1.15277778 -2.0065425 -0.2990130 0.0047718
The overall results are similar to 2-Hydroxy Benzoic Acid in that the water and soil samples could be considered different from each other, but the samples for each type are not significantly different from the others of their type.
D-Cellobiose:
## Tukey multiple comparisons of means
## 95% family-wise confidence level
##
## Fit: aov(formula = X$`D-Cellobiose`$Hr_144 ~ X$`D-Cellobiose`$Sample.ID)
##
## $`X$`D-Cellobiose`$Sample.ID`
## diff lwr upr p adj
## Soil_1-Clear_Creek 1.1000000 0.03010711 2.1698929 0.0420455
## Soil_2-Clear_Creek 0.7356667 -0.33422622 1.8055596 0.2638688
## Waste_Water-Clear_Creek 0.3753333 -0.69455956 1.4452262 0.7779948
## Soil_2-Soil_1 -0.3643333 -1.43422622 0.7055596 0.7929858
## Waste_Water-Soil_1 -0.7246667 -1.79455956 0.3452262 0.2760710
## Waste_Water-Soil_2 -0.3603333 -1.43022622 0.7095596 0.7983405
The clear creek sample is significantly different from the soil samples, but the other samples are similar enough to each other that it is not significantly different.
D-Galactonic Acid γ-Lactone:
## Tukey multiple comparisons of means
## 95% family-wise confidence level
##
## Fit: aov(formula = X$`D-Galactonic Acid γ-Lactone`$Hr_144 ~ X$`D-Galactonic Acid γ-Lactone`$Sample.ID)
##
## $`X$`D-Galactonic Acid γ-Lactone`$Sample.ID`
## diff lwr upr p adj
## Soil_1-Clear_Creek 0.3287778 -0.6065223 1.26407787 0.7769498
## Soil_2-Clear_Creek 0.6863333 -0.2489668 1.62163343 0.2135625
## Waste_Water-Clear_Creek -0.2608889 -1.1961890 0.67441120 0.8735360
## Soil_2-Soil_1 0.3575556 -0.5777445 1.29285565 0.7299670
## Waste_Water-Soil_1 -0.5896667 -1.5249668 0.34563343 0.3362306
## Waste_Water-Soil_2 -0.9472222 -1.8825223 -0.01192213 0.0462444
The Soil 2 sample and the Waste water are different enough to be of merit (p < 0.05), but the others are not.
D-Galacturonic Acid:
## Tukey multiple comparisons of means
## 95% family-wise confidence level
##
## Fit: aov(formula = X$`D-Galacturonic Acid`$Hr_144 ~ X$`D-Galacturonic Acid`$Sample.ID)
##
## $`X$`D-Galacturonic Acid`$Sample.ID`
## diff lwr upr p adj
## Soil_1-Clear_Creek 0.6980000 -0.3543332 1.7503332 0.2933179
## Soil_2-Clear_Creek 0.8248889 -0.2274443 1.8772221 0.1673271
## Waste_Water-Clear_Creek 0.1696667 -0.8826665 1.2219998 0.9716474
## Soil_2-Soil_1 0.1268889 -0.9254443 1.1792221 0.9877314
## Waste_Water-Soil_1 -0.5283333 -1.5806665 0.5239998 0.5326739
## Waste_Water-Soil_2 -0.6552222 -1.7075554 0.3971110 0.3469661
None of the samples are significantly different from each other in this substrate.
D-Glucosaminic Acid:
## Tukey multiple comparisons of means
## 95% family-wise confidence level
##
## Fit: aov(formula = X$`D-Glucosaminic Acid`$Hr_144 ~ X$`D-Glucosaminic Acid`$Sample.ID)
##
## $`X$`D-Glucosaminic Acid`$Sample.ID`
## diff lwr upr p adj
## Soil_1-Clear_Creek 1.53411111 0.8008127 2.2674096 0.0000163
## Soil_2-Clear_Creek 1.46444444 0.7311460 2.1977429 0.0000344
## Waste_Water-Clear_Creek 0.09244444 -0.6408540 0.8257429 0.9860351
## Soil_2-Soil_1 -0.06966667 -0.8029651 0.6636318 0.9938997
## Waste_Water-Soil_1 -1.44166667 -2.1749651 -0.7083682 0.0000439
## Waste_Water-Soil_2 -1.37200000 -2.1052984 -0.6387016 0.0000921
the water samples are not significantly different from each other, and the same goes for the soil samples. All other comparisons are significant.
D-Mallic Acid:
## Tukey multiple comparisons of means
## 95% family-wise confidence level
##
## Fit: aov(formula = X$`D-Mallic Acid`$Hr_144 ~ X$`D-Mallic Acid`$Sample.ID)
##
## $`X$`D-Mallic Acid`$Sample.ID`
## diff lwr upr p adj
## Soil_1-Clear_Creek 0.41400000 -0.30203546 1.1300355 0.4114539
## Soil_2-Clear_Creek 0.76644444 0.05040898 1.4824799 0.0322204
## Waste_Water-Clear_Creek -0.07133333 -0.78736880 0.6447021 0.9929850
## Soil_2-Soil_1 0.35244444 -0.36359102 1.0684799 0.5489886
## Waste_Water-Soil_1 -0.48533333 -1.20136880 0.2307021 0.2754933
## Waste_Water-Soil_2 -0.83777778 -1.55381324 -0.1217423 0.0167689
Soil 2 is significantly different from the water samples. The other samples are not functionally different.
D-Mannitol:
## Tukey multiple comparisons of means
## 95% family-wise confidence level
##
## Fit: aov(formula = X$`D-Mannitol`$Hr_144 ~ X$`D-Mannitol`$Sample.ID)
##
## $`X$`D-Mannitol`$Sample.ID`
## diff lwr upr p adj
## Soil_1-Clear_Creek 1.5651111 0.48048812 2.6497341 0.0024258
## Soil_2-Clear_Creek 1.0444444 -0.04017855 2.1290674 0.0624899
## Waste_Water-Clear_Creek 0.8233333 -0.26128966 1.9079563 0.1891527
## Soil_2-Soil_1 -0.5206667 -1.60528966 0.5639563 0.5692646
## Waste_Water-Soil_1 -0.7417778 -1.82640077 0.3428452 0.2682227
## Waste_Water-Soil_2 -0.2211111 -1.30573410 0.8635119 0.9452371
Clear creek is significantly different from soil sample 1.
D-Xylose:
## Tukey multiple comparisons of means
## 95% family-wise confidence level
##
## Fit: aov(formula = X$`D-Xylose`$Hr_144 ~ X$`D-Xylose`$Sample.ID)
##
## $`X$`D-Xylose`$Sample.ID`
## diff lwr upr p adj
## Soil_1-Clear_Creek 1.47800000 0.7789908 2.1770092 0.0000137
## Soil_2-Clear_Creek 1.78822222 1.0892130 2.4872314 0.0000004
## Waste_Water-Clear_Creek 0.05788889 -0.6411203 0.7568981 0.9959330
## Soil_2-Soil_1 0.31022222 -0.3887870 1.0092314 0.6298162
## Waste_Water-Soil_1 -1.42011111 -2.1191203 -0.7211019 0.0000262
## Waste_Water-Soil_2 -1.73033333 -2.4293425 -1.0313241 0.0000008
Similar conclusion to 2-Hydroxy Benzoic Acid
D.L -α-Glycerol Phosphate:
## Tukey multiple comparisons of means
## 95% family-wise confidence level
##
## Fit: aov(formula = X$`D.L -α-Glycerol Phosphate`$Hr_144 ~ X$`D.L -α-Glycerol Phosphate`$Sample.ID)
##
## $`X$`D.L -α-Glycerol Phosphate`$Sample.ID`
## diff lwr upr p adj
## Soil_1-Clear_Creek 0.23588889 -0.004638569 0.4764163 0.0561963
## Soil_2-Clear_Creek 0.17155556 -0.068971902 0.4120830 0.2350363
## Waste_Water-Clear_Creek 0.21744444 -0.023083014 0.4579719 0.0879629
## Soil_2-Soil_1 -0.06433333 -0.304860791 0.1761941 0.8865110
## Waste_Water-Soil_1 -0.01844444 -0.258971902 0.2220830 0.9967617
## Waste_Water-Soil_2 0.04588889 -0.194638569 0.2864163 0.9544335
None of the results are significant.
Glucose-1-Phosphate:
## Tukey multiple comparisons of means
## 95% family-wise confidence level
##
## Fit: aov(formula = X$`Glucose-1-Phosphate`$Hr_144 ~ X$`Glucose-1-Phosphate`$Sample.ID)
##
## $`X$`Glucose-1-Phosphate`$Sample.ID`
## diff lwr upr p adj
## Soil_1-Clear_Creek 0.8290000 0.04189107 1.6161089 0.0359319
## Soil_2-Clear_Creek 0.6822222 -0.10488671 1.4693311 0.1082668
## Waste_Water-Clear_Creek 0.2132222 -0.57388671 1.0003311 0.8827085
## Soil_2-Soil_1 -0.1467778 -0.93388671 0.6403311 0.9572482
## Waste_Water-Soil_1 -0.6157778 -1.40288671 0.1713311 0.1686235
## Waste_Water-Soil_2 -0.4690000 -1.25610893 0.3181089 0.3851510
The soil_1 clear creek comparison is the only one with significant differences.
Glycogen:
TukeyHSD(aov(X$Glycogen$Hr_144~X$Glycogen$Sample.ID))
## Tukey multiple comparisons of means
## 95% family-wise confidence level
##
## Fit: aov(formula = X$Glycogen$Hr_144 ~ X$Glycogen$Sample.ID)
##
## $`X$Glycogen$Sample.ID`
## diff lwr upr p adj
## Soil_1-Clear_Creek 1.14000000 0.2577321 2.0222679 0.0072219
## Soil_2-Clear_Creek 1.09033333 0.2080655 1.9726012 0.0107045
## Waste_Water-Clear_Creek 0.62577778 -0.2564901 1.5080457 0.2393446
## Soil_2-Soil_1 -0.04966667 -0.9319345 0.8326012 0.9987084
## Waste_Water-Soil_1 -0.51422222 -1.3964901 0.3680457 0.4044388
## Waste_Water-Soil_2 -0.46455556 -1.3468234 0.4177123 0.4925593
The comparisons between the clear creek and soil samples were significant. No others were.
Glycyl-L-Glutamic Acid:
TukeyHSD(aov(data = X$`Glycyl-L-Glutamic Acid`, Hr_144~Sample.ID))
## Tukey multiple comparisons of means
## 95% family-wise confidence level
##
## Fit: aov(formula = Hr_144 ~ Sample.ID, data = X$`Glycyl-L-Glutamic Acid`)
##
## $Sample.ID
## diff lwr upr p adj
## Soil_1-Clear_Creek 1.4578889 0.7325680 2.1832098 0.0000311
## Soil_2-Clear_Creek 0.9242222 0.1989013 1.6495432 0.0081919
## Waste_Water-Clear_Creek 0.3012222 -0.4240987 1.0265432 0.6769513
## Soil_2-Soil_1 -0.5336667 -1.2589876 0.1916543 0.2115967
## Waste_Water-Soil_1 -1.1566667 -1.8819876 -0.4313457 0.0007778
## Waste_Water-Soil_2 -0.6230000 -1.3483209 0.1023209 0.1129817
Both of the soil samples compared to the clear creek were significant, as was the comparison between waste water and soil 1.
i-Erythitol:
TukeyHSD(aov(data = X$`i-Erythitol`, Hr_144~Sample.ID))
## Tukey multiple comparisons of means
## 95% family-wise confidence level
##
## Fit: aov(formula = Hr_144 ~ Sample.ID, data = X$`i-Erythitol`)
##
## $Sample.ID
## diff lwr upr p adj
## Soil_1-Clear_Creek 1.0673333 0.2454951 1.88917156 0.0068931
## Soil_2-Clear_Creek 1.1838889 0.3620507 2.00572712 0.0024703
## Waste_Water-Clear_Creek 0.1825556 -0.6392827 1.00439379 0.9307327
## Soil_2-Soil_1 0.1165556 -0.7052827 0.93839379 0.9803645
## Waste_Water-Soil_1 -0.8847778 -1.7066160 -0.06293955 0.0309719
## Waste_Water-Soil_2 -1.0013333 -1.8231716 -0.17949510 0.0120702
water to water and soil to soil comparisons were not significantly different. All others were.
Itaconic Acid:
TukeyHSD(aov(data = X$`Itaconic Acid`, Hr_144~Sample.ID))
## Tukey multiple comparisons of means
## 95% family-wise confidence level
##
## Fit: aov(formula = Hr_144 ~ Sample.ID, data = X$`Itaconic Acid`)
##
## $Sample.ID
## diff lwr upr p adj
## Soil_1-Clear_Creek 1.5534444 0.9661354 2.1407535 0.0000002
## Soil_2-Clear_Creek 1.4155556 0.8282465 2.0028646 0.0000014
## Waste_Water-Clear_Creek -0.1370000 -0.7243090 0.4503090 0.9209467
## Soil_2-Soil_1 -0.1378889 -0.7251979 0.4494202 0.9195618
## Waste_Water-Soil_1 -1.6904444 -2.2777535 -1.1031354 0.0000000
## Waste_Water-Soil_2 -1.5525556 -2.1398646 -0.9652465 0.0000002
This has a similar conclusion to the i-erythitol and the 2-Hydroxy Benzoic Acid.
L-Arginine:
TukeyHSD(aov(data = X$`L-Arginine`, Hr_144~Sample.ID))
## Tukey multiple comparisons of means
## 95% family-wise confidence level
##
## Fit: aov(formula = Hr_144 ~ Sample.ID, data = X$`L-Arginine`)
##
## $Sample.ID
## diff lwr upr p adj
## Soil_1-Clear_Creek 1.040888889 0.1591145 1.9226633 0.0156314
## Soil_2-Clear_Creek 1.048333333 0.1665590 1.9301077 0.0147634
## Waste_Water-Clear_Creek 0.282444444 -0.5993299 1.1642188 0.8212838
## Soil_2-Soil_1 0.007444444 -0.8743299 0.8892188 0.9999956
## Waste_Water-Soil_1 -0.758444444 -1.6402188 0.1233299 0.1122443
## Waste_Water-Soil_2 -0.765888889 -1.6476633 0.1158855 0.1071850
Clear creek compared to each soil sample had significant results.
L-Asparagine:
TukeyHSD(aov(data = X$`L-Asparganine`, Hr_144~Sample.ID))
## Tukey multiple comparisons of means
## 95% family-wise confidence level
##
## Fit: aov(formula = Hr_144 ~ Sample.ID, data = X$`L-Asparganine`)
##
## $Sample.ID
## diff lwr upr p adj
## Soil_1-Clear_Creek 1.3960000 0.2765247 2.5154753 0.0099061
## Soil_2-Clear_Creek 1.2522222 0.1327470 2.3716975 0.0235947
## Waste_Water-Clear_Creek 0.4495556 -0.6699197 1.5690308 0.6992470
## Soil_2-Soil_1 -0.1437778 -1.2632530 0.9756975 0.9852602
## Waste_Water-Soil_1 -0.9464444 -2.0659197 0.1730308 0.1215209
## Waste_Water-Soil_2 -0.8026667 -1.9221419 0.3168086 0.2310053
Similar conclusion to L-Arginine.
L-Phenylalanine:
TukeyHSD(aov(Hr_144~Sample.ID, data = X$`L-Phenylalanine`))
## Tukey multiple comparisons of means
## 95% family-wise confidence level
##
## Fit: aov(formula = Hr_144 ~ Sample.ID, data = X$`L-Phenylalanine`)
##
## $Sample.ID
## diff lwr upr p adj
## Soil_1-Clear_Creek 1.6405556 0.7955115 2.4855996 0.0000532
## Soil_2-Clear_Creek 1.4356667 0.5906226 2.2807107 0.0003502
## Waste_Water-Clear_Creek 0.2298889 -0.6151552 1.0749330 0.8814170
## Soil_2-Soil_1 -0.2048889 -1.0499330 0.6401552 0.9123347
## Waste_Water-Soil_1 -1.4106667 -2.2557107 -0.5656226 0.0004397
## Waste_Water-Soil_2 -1.2057778 -2.0508218 -0.3607337 0.0027316
This had a similar conclusion to 2-Hydroxy Benzoic Acid.
Note: the rest of the values will not have a sentence at the end, but significant results have a p adj (p value) of less than 0.05.
L-Serine:
TukeyHSD(aov(Hr_144~Sample.ID, X$`L-Serine`))
## Tukey multiple comparisons of means
## 95% family-wise confidence level
##
## Fit: aov(formula = Hr_144 ~ Sample.ID, data = X$`L-Serine`)
##
## $Sample.ID
## diff lwr upr p adj
## Soil_1-Clear_Creek 1.2451111 0.3546204 2.1356018 0.0033695
## Soil_2-Clear_Creek 1.3884444 0.4979537 2.2789352 0.0010180
## Waste_Water-Clear_Creek 0.6746667 -0.2158241 1.5651574 0.1905045
## Soil_2-Soil_1 0.1433333 -0.7471574 1.0338241 0.9717820
## Waste_Water-Soil_1 -0.5704444 -1.4609352 0.3200463 0.3226013
## Waste_Water-Soil_2 -0.7137778 -1.6042685 0.1767129 0.1529666
L-Threonine:
TukeyHSD(aov(Hr_144~Sample.ID, X$`L-Threonine`))
## Tukey multiple comparisons of means
## 95% family-wise confidence level
##
## Fit: aov(formula = Hr_144 ~ Sample.ID, data = X$`L-Threonine`)
##
## $Sample.ID
## diff lwr upr p adj
## Soil_1-Clear_Creek 1.4614444 0.7328603 2.1900286 0.0000321
## Soil_2-Clear_Creek 1.1394444 0.4108603 1.8680286 0.0009823
## Waste_Water-Clear_Creek 0.2886667 -0.4399174 1.0172508 0.7078888
## Soil_2-Soil_1 -0.3220000 -1.0505841 0.4065841 0.6328971
## Waste_Water-Soil_1 -1.1727778 -1.9013619 -0.4441937 0.0006941
## Waste_Water-Soil_2 -0.8507778 -1.5793619 -0.1221937 0.0170310
N-Acetyl-D-Glucosamine:
TukeyHSD(aov(Hr_144~Sample.ID, X$`N-Acetyl-D-Glucosamine`))
## Tukey multiple comparisons of means
## 95% family-wise confidence level
##
## Fit: aov(formula = Hr_144 ~ Sample.ID, data = X$`N-Acetyl-D-Glucosamine`)
##
## $Sample.ID
## diff lwr upr p adj
## Soil_1-Clear_Creek 1.3644444 0.44289647 2.2859924 0.0018358
## Soil_2-Clear_Creek 1.1478889 0.22634092 2.0694369 0.0100032
## Waste_Water-Clear_Creek 0.8701111 -0.05143686 1.7916591 0.0698034
## Soil_2-Soil_1 -0.2165556 -1.13810353 0.7049924 0.9193684
## Waste_Water-Soil_1 -0.4943333 -1.41588131 0.4272146 0.4766431
## Waste_Water-Soil_2 -0.2777778 -1.19932575 0.6437702 0.8461446
Phenylethylamine:
TukeyHSD(aov(Hr_144~Sample.ID, X$Phenylethylamine))
## Tukey multiple comparisons of means
## 95% family-wise confidence level
##
## Fit: aov(formula = Hr_144 ~ Sample.ID, data = X$Phenylethylamine)
##
## $Sample.ID
## diff lwr upr p adj
## Soil_1-Clear_Creek 1.2682222 0.4057969 2.13064759 0.0019785
## Soil_2-Clear_Creek 1.6035556 0.7411302 2.46598093 0.0001009
## Waste_Water-Clear_Creek 0.3846667 -0.4777587 1.24709204 0.6261055
## Soil_2-Soil_1 0.3353333 -0.5270920 1.19775870 0.7196500
## Waste_Water-Soil_1 -0.8835556 -1.7459809 -0.02113018 0.0430062
## Waste_Water-Soil_2 -1.2188889 -2.0813143 -0.35646352 0.0030174
Putrescine:
TukeyHSD(aov(Hr_144~Sample.ID, X$Putrescine))
## Tukey multiple comparisons of means
## 95% family-wise confidence level
##
## Fit: aov(formula = Hr_144 ~ Sample.ID, data = X$Putrescine)
##
## $Sample.ID
## diff lwr upr p adj
## Soil_1-Clear_Creek 0.62222222 0.01185711 1.2325873 0.0443785
## Soil_2-Clear_Creek 0.68655556 0.07619045 1.2969207 0.0226468
## Waste_Water-Clear_Creek 0.45477778 -0.15558733 1.0651429 0.2024029
## Soil_2-Soil_1 0.06433333 -0.54603178 0.6746984 0.9917214
## Waste_Water-Soil_1 -0.16744444 -0.77780955 0.4429207 0.8788248
## Waste_Water-Soil_2 -0.23177778 -0.84214289 0.3785873 0.7339687
Pyruvic Acid Methyl Ester:
TukeyHSD(aov(Hr_144~Sample.ID, X$`Pyruvic Acid Methyl Ester`))
## Tukey multiple comparisons of means
## 95% family-wise confidence level
##
## Fit: aov(formula = Hr_144 ~ Sample.ID, data = X$`Pyruvic Acid Methyl Ester`)
##
## $Sample.ID
## diff lwr upr p adj
## Soil_1-Clear_Creek 1.08722222 0.09811341 2.0763310 0.0267730
## Soil_2-Clear_Creek 1.00711111 0.01800229 1.9962199 0.0447158
## Waste_Water-Clear_Creek 0.40866667 -0.58044215 1.3977755 0.6804312
## Soil_2-Soil_1 -0.08011111 -1.06921993 0.9089977 0.9961922
## Waste_Water-Soil_1 -0.67855556 -1.66766437 0.3105533 0.2657209
## Waste_Water-Soil_2 -0.59844444 -1.58755326 0.3906644 0.3718222
Tween 40:
TukeyHSD(aov(Hr_144~Sample.ID, X$`Tween 40`))
## Tukey multiple comparisons of means
## 95% family-wise confidence level
##
## Fit: aov(formula = Hr_144 ~ Sample.ID, data = X$`Tween 40`)
##
## $Sample.ID
## diff lwr upr p adj
## Soil_1-Clear_Creek 1.3850000 0.5179582 2.2520418 0.0007621
## Soil_2-Clear_Creek 1.1607778 0.2937360 2.0278196 0.0051809
## Waste_Water-Clear_Creek 0.1405556 -0.7264863 1.0075974 0.9712046
## Soil_2-Soil_1 -0.2242222 -1.0912640 0.6428196 0.8960556
## Waste_Water-Soil_1 -1.2444444 -2.1114863 -0.3774026 0.0025680
## Waste_Water-Soil_2 -1.0202222 -1.8872640 -0.1531804 0.0160350
Tween 80:
TukeyHSD(aov(Hr_144~Sample.ID, X$`Tween 80 `))
## Tukey multiple comparisons of means
## 95% family-wise confidence level
##
## Fit: aov(formula = Hr_144 ~ Sample.ID, data = X$`Tween 80 `)
##
## $Sample.ID
## diff lwr upr p adj
## Soil_1-Clear_Creek 0.92488889 0.006564718 1.8432131 0.0478617
## Soil_2-Clear_Creek 0.97722222 0.058898052 1.8955464 0.0335314
## Waste_Water-Clear_Creek 0.77966667 -0.138657504 1.6979908 0.1192092
## Soil_2-Soil_1 0.05233333 -0.865990837 0.9706575 0.9986604
## Waste_Water-Soil_1 -0.14522222 -1.063546393 0.7731019 0.9731690
## Waste_Water-Soil_2 -0.19755556 -1.115879726 0.7207686 0.9365216
Water (control):
TukeyHSD(aov(Hr_144~Sample.ID, X$Water))
## Tukey multiple comparisons of means
## 95% family-wise confidence level
##
## Fit: aov(formula = Hr_144 ~ Sample.ID, data = X$Water)
##
## $Sample.ID
## diff lwr upr p adj
## Soil_1-Clear_Creek 0 0 0 NaN
## Soil_2-Clear_Creek 0 0 0 NaN
## Waste_Water-Clear_Creek 0 0 0 NaN
## Soil_2-Soil_1 0 0 0 NaN
## Waste_Water-Soil_1 0 0 0 NaN
## Waste_Water-Soil_2 0 0 0 NaN
α-Cyclodextrin:
TukeyHSD(aov(Hr_144~Sample.ID, X$`α-Cyclodextrin`))
## Tukey multiple comparisons of means
## 95% family-wise confidence level
##
## Fit: aov(formula = Hr_144 ~ Sample.ID, data = X$`α-Cyclodextrin`)
##
## $Sample.ID
## diff lwr upr p adj
## Soil_1-Clear_Creek 1.45100000 0.6667222 2.2352778 0.0001084
## Soil_2-Clear_Creek 1.36744444 0.5831666 2.1517222 0.0002481
## Waste_Water-Clear_Creek 0.28555556 -0.4987222 1.0698334 0.7580864
## Soil_2-Soil_1 -0.08355556 -0.8678334 0.7007222 0.9914567
## Waste_Water-Soil_1 -1.16544444 -1.9497222 -0.3811666 0.0017633
## Waste_Water-Soil_2 -1.08188889 -1.8661667 -0.2976111 0.0038625
α-D-Lactose:
TukeyHSD(aov(Hr_144~Sample.ID, X$`α-D-Lactose`))
## Tukey multiple comparisons of means
## 95% family-wise confidence level
##
## Fit: aov(formula = Hr_144 ~ Sample.ID, data = X$`α-D-Lactose`)
##
## $Sample.ID
## diff lwr upr p adj
## Soil_1-Clear_Creek 1.1403333 0.1539778 2.1266888 0.0184074
## Soil_2-Clear_Creek 0.8175556 -0.1687999 1.8039111 0.1327186
## Waste_Water-Clear_Creek 0.4443333 -0.5420222 1.4306888 0.6186785
## Soil_2-Soil_1 -0.3227778 -1.3091333 0.6635777 0.8117723
## Waste_Water-Soil_1 -0.6960000 -1.6823555 0.2903555 0.2433658
## Waste_Water-Soil_2 -0.3732222 -1.3595777 0.6131333 0.7360795
α-Ketobutyric Acid:
TukeyHSD(aov(Hr_144~Sample.ID, X$`α-Ketobutyric Acid`))
## Tukey multiple comparisons of means
## 95% family-wise confidence level
##
## Fit: aov(formula = Hr_144 ~ Sample.ID, data = X$`α-Ketobutyric Acid`)
##
## $Sample.ID
## diff lwr upr p adj
## Soil_1-Clear_Creek 1.15111111 0.4879489 1.8142734 0.0002635
## Soil_2-Clear_Creek 1.10044444 0.4372822 1.7636067 0.0004746
## Waste_Water-Clear_Creek -0.05000000 -0.7131623 0.6131623 0.9969203
## Soil_2-Soil_1 -0.05066667 -0.7138289 0.6124956 0.9967969
## Waste_Water-Soil_1 -1.20111111 -1.8642734 -0.5379489 0.0001468
## Waste_Water-Soil_2 -1.15044444 -1.8136067 -0.4872822 0.0002656
β-Methyl-D- Glucoside:
TukeyHSD(aov(Hr_144~Sample.ID, X$`β-Methyl-D- Glucoside`))
## Tukey multiple comparisons of means
## 95% family-wise confidence level
##
## Fit: aov(formula = Hr_144 ~ Sample.ID, data = X$`β-Methyl-D- Glucoside`)
##
## $Sample.ID
## diff lwr upr p adj
## Soil_1-Clear_Creek -0.02477778 -0.9356003 0.8860448 0.9998532
## Soil_2-Clear_Creek 0.13744444 -0.7733781 1.0482670 0.9765323
## Waste_Water-Clear_Creek -0.29766667 -1.2084892 0.6131559 0.8123712
## Soil_2-Soil_1 0.16222222 -0.7486003 1.0730448 0.9624174
## Waste_Water-Soil_1 -0.27288889 -1.1837114 0.6379337 0.8484510
## Waste_Water-Soil_2 -0.43511111 -1.3459337 0.4757114 0.5731574
γ-Hydroxybutyric Acid:
TukeyHSD(aov(Hr_144~Sample.ID, X$`γ-Hydroxybutyric Acid`))
## Tukey multiple comparisons of means
## 95% family-wise confidence level
##
## Fit: aov(formula = Hr_144 ~ Sample.ID, data = X$`γ-Hydroxybutyric Acid`)
##
## $Sample.ID
## diff lwr upr p adj
## Soil_1-Clear_Creek 0.34244444 -0.1386477 0.8235366 0.2365960
## Soil_2-Clear_Creek 0.24355556 -0.2375366 0.7246477 0.5257464
## Waste_Water-Clear_Creek 0.25788889 -0.2232033 0.7389810 0.4772327
## Soil_2-Soil_1 -0.09888889 -0.5799810 0.3822033 0.9439758
## Waste_Water-Soil_1 -0.08455556 -0.5656477 0.3965366 0.9637960
## Waste_Water-Soil_2 0.01433333 -0.4667588 0.4954255 0.9998072
The next question is to determine which substrates are driving any differences between the samples.
From the last exercise, we can conclude that most of the substrates have some influence on which direction the distribution would go in a t test or Tukey test, except for those that have a very high p-value. Those include D-Galacturonic Acid, D.L -α-Glycerol Phosphate, β-Methyl-D- Glucoside, and γ-Hydroxybutyric Acid.
We also want to know if the dilution factor changes the results. The first step would be to graph the data based on sample ID and see if there is a difference between the dilution factors for each sample ID.
ggplotly(ggplot(BioLog, aes(x = Dilution, y=Hr_144))+
geom_point()+
facet_wrap(~Sample.ID))
We can see that there is an upward trend in the clear creek absorbance at 144 hours, but a downward trend in the soil samples. The waste water data has no clear trend.
Next, we will conduct a tukey test on the samples, subsetted by the dilution factor. + This will help us tell if there is a difference between any dilution factors and the overall result.
Here is the original tukey test for comparison:
TukeyHSD(aov(BioLog$Hr_144~BioLog$Sample.ID))
## Tukey multiple comparisons of means
## 95% family-wise confidence level
##
## Fit: aov(formula = BioLog$Hr_144 ~ BioLog$Sample.ID)
##
## $`BioLog$Sample.ID`
## diff lwr upr p adj
## Soil_1-Clear_Creek 1.04815625 0.89065127 1.20566123 0.0000000
## Soil_2-Clear_Creek 0.98981944 0.83231447 1.14732442 0.0000000
## Waste_Water-Clear_Creek 0.25729167 0.09978669 0.41479664 0.0001665
## Soil_2-Soil_1 -0.05833681 -0.21584178 0.09916817 0.7761474
## Waste_Water-Soil_1 -0.79086458 -0.94836956 -0.63335961 0.0000000
## Waste_Water-Soil_2 -0.73252778 -0.89003276 -0.57502280 0.0000000
Dilution 0.001:
TukeyHSD(aov(Y$`0.001`$Hr_144~Y$`0.001`$Sample.ID))
## Tukey multiple comparisons of means
## 95% family-wise confidence level
##
## Fit: aov(formula = Y$`0.001`$Hr_144 ~ Y$`0.001`$Sample.ID)
##
## $`Y$`0.001`$Sample.ID`
## diff lwr upr p adj
## Soil_1-Clear_Creek 1.7095000 1.46763326 1.9513667 0.0000000
## Soil_2-Clear_Creek 1.8459687 1.60410201 2.0878355 0.0000000
## Waste_Water-Clear_Creek 0.3015208 0.05965409 0.5433876 0.0076525
## Soil_2-Soil_1 0.1364687 -0.10539799 0.3783355 0.4653226
## Waste_Water-Soil_1 -1.4079792 -1.64984591 -1.1661124 0.0000000
## Waste_Water-Soil_2 -1.5444479 -1.78631466 -1.3025812 0.0000000
This data would come to a similar conclusion to the original; the soil samples are the only ones that are not significantly different from each other when compared in pairs.
Dilution 0.01:
TukeyHSD(aov(Y$`0.01`$Hr_144~Y$`0.01`$Sample.ID))
## Tukey multiple comparisons of means
## 95% family-wise confidence level
##
## Fit: aov(formula = Y$`0.01`$Hr_144 ~ Y$`0.01`$Sample.ID)
##
## $`Y$`0.01`$Sample.ID`
## diff lwr upr p adj
## Soil_1-Clear_Creek 1.3339479 1.1065712 1.56132461 0.0000000
## Soil_2-Clear_Creek 1.1850208 0.9576441 1.41239753 0.0000000
## Waste_Water-Clear_Creek 0.3319167 0.1045400 0.55929336 0.0010948
## Soil_2-Soil_1 -0.1489271 -0.3763038 0.07844961 0.3302869
## Waste_Water-Soil_1 -1.0020313 -1.2294079 -0.77465455 0.0000000
## Waste_Water-Soil_2 -0.8531042 -1.0804809 -0.62572747 0.0000000
This distribution also has a similar conclusion. The p value comparing the waste water to clear creek, as well as the one comparing the two soil samples, have a lesser value than their respective p values in the previous dilution.
Dilution 0.1:
TukeyHSD(aov(Y$`0.1`$Hr_144~Y$`0.1`$Sample.ID))
## Tukey multiple comparisons of means
## 95% family-wise confidence level
##
## Fit: aov(formula = Y$`0.1`$Hr_144 ~ Y$`0.1`$Sample.ID)
##
## $`Y$`0.1`$Sample.ID`
## diff lwr upr p adj
## Soil_1-Clear_Creek 0.10102083 -0.14426546 0.34630712 0.7124140
## Soil_2-Clear_Creek -0.06153125 -0.30681754 0.18375504 0.9164532
## Waste_Water-Clear_Creek 0.13843750 -0.10684879 0.38372379 0.4650681
## Soil_2-Soil_1 -0.16255208 -0.40783837 0.08273421 0.3198141
## Waste_Water-Soil_1 0.03741667 -0.20786962 0.28270296 0.9792916
## Waste_Water-Soil_2 0.19996875 -0.04531754 0.44525504 0.1537544
In this dilution factor, none of the samples were significantly different from each other.
Because one of the dilution factors is different from the overall results, but two were the same, it is reasonable to conclude that the dilution factor could have influenced the results. If the 0.1 dilution had not been included, it would be more difficult to tell. ___
Finally, we want to find out if the control samples show any sign of contamination. There are two ways to do this: + look at the absorbances of the control (in this case, the water substrate) OR + check the absorbances of all the substrates for negative values
For the first method, we will subset the BioLog data set into just the control values, then see if there are any non-zero values by using the unique() function:
Control <- BioLog[BioLog$Substrate == "Water",]
unique(as.integer(as.character(Control$Hr_24)))
## [1] 0
unique(as.integer(as.character(Control$Hr_48)))
## [1] 0
unique(as.integer(as.character(Control$Hr_144)))
## [1] 0
All three absorbance categories had no absorbances other than zero.
Since it is possible that the samples were blanked using the water substrate for each trial, we will test the entire biolog using the second method described. We will use logical comparisons and sum the answer (where TRUE is equal to 1 and FALSE is equal to 0) to check if there are any absorbance values below zero for any substrate. We will do this for all three absorbance times.
sum(BioLog$Hr_144 <0)
## [1] 0
sum(BioLog$Hr_48 <0)
## [1] 0
sum(BioLog$Hr_24 <0)
## [1] 0
The results from this part suggest that there are no negative absorbance values for any substrate. Therefore, we can conclude that there was no contamination in the control sample.